Homework 12: Make Your Mark with Maps

Layered Maps, Choropleth Maps, Map Projections, Animation

This week, we’ll layer points and other geometries on maps, create choropleth maps, learn more about map projections, and create animated plots in ggplot().

alt text

alt text



Problem 2: Mapping US Flights

(40 points total)

Base Map

  1. (5 points) Load the airports and routes datasets from Lecture 23. Use ggmap to create a map centered on the United States. Add points corresponding to the location of each airport, sized by the number of arriving flights at that airport.
library(plyr)
library(dplyr)
library(ggmap)
airports <- read.csv("https://raw.githubusercontent.com/jpatokal/openflights/master/data/airports.dat", header = FALSE)
colnames(airports) <- c("ID", "name", "city", "country", "IATA_FAA", "ICAO",
                        "lat", "lon", "altitude", "timezone", "DST")

routes <- read.csv("https://raw.githubusercontent.com/jpatokal/openflights/master/data/routes.dat", header=F)
colnames(routes) <- c("airline", "airlineID", "sourceAirport",
                      "sourceAirportID", "destinationAirport",
                      "destinationAirportID", "codeshare", "stops", "equipment")

departures <- ddply(routes, .(sourceAirportID), "nrow")
names(departures)[2] <- "flights"
arrivals <- ddply(routes, .(destinationAirportID), "nrow")
names(arrivals)[2] <- "flights"

airportD <- merge(airports, departures, by.x = "ID", 
                  by.y = "sourceAirportID")
airportA <- merge(airports, arrivals, by.x = "ID", 
                  by.y = "destinationAirportID")
map <- get_map(location = "United States", zoom = 4)
## Source : https://maps.googleapis.com/maps/api/staticmap?center=United+States&zoom=4&size=640x640&scale=2&maptype=terrain&language=en-EN
## Source : https://maps.googleapis.com/maps/api/geocode/json?address=United%20States
ggmap(map) + 
  geom_point(aes(x = lon, y = lat, size = flights), data = airportA, 
             alpha = .45) +
  labs(x = "Longitude", y = "Latitude") +
  ggtitle("Flight Arrival Locations") +
  scale_size_continuous(name = "Number of\nArrivals", trans="sqrt",
                        breaks=c(10, 50, 200, 800))



Routes to and from Atlanta

  1. (20 points) Recreate your plot in (a). This time, manipulate the routes and airports datasets so that you can use geom_segment() to draw a line connecting each airport for each flight listed in the routes dataset. That is, draw a line that connects the departing airport and the arrival airport. Do this only for flights that either depart from or arrive at the Atlanta (ATL) airport.
atl_routes <- subset(routes, routes$sourceAirport == "ATL" |
                       routes$destinationAirport == "ATL")
atl_airports <- merge(atl_routes, airports, by.x = "sourceAirport",
                      by.y = "IATA_FAA")
atl_airports <- atl_airports[, c("destinationAirport", "lat", "lon",
                                 "timezone")]
colnames(atl_airports) <- c("destinationAirport", "source_lat", "source_lon", 
                            "source_timezone")
atl_airports <- merge(atl_airports, airports, by.x = "destinationAirport",
                      by.y = "IATA_FAA")
atl_airports <- atl_airports[, c("source_lat", "source_lon", "source_timezone", 
                                 "lat", "lon", "timezone")]
colnames(atl_airports) <- c("source_lat", "source_lon", "source_timezone", 
                            "dest_lat", "dest_lon", "dest_timezone")

ggmap(map) + 
  geom_segment(aes(x = source_lon, y = source_lat, xend = dest_lon,
                   yend = dest_lat), data = atl_airports, alpha=.15) +
  labs(x = "Longitude", y = "Latitude") +
  ggtitle("Flights to and from Atlanta")



Arrow Route Diagram

  1. (5 points) Recreate your plot from (b). Change the lines on your plot so that they are now curved arrows indicating the start and end points of each flight route. See the help documentation for geom_segment() here for how to do this. Do this only for flights that either depart from or arrive at the Atlanta (ATL) airport.
ggmap(map) + 
  geom_curve(aes(x = source_lon, y = source_lat, xend = dest_lon,
                 yend = dest_lat), data = atl_airports, 
             arrow = arrow(length = unit(0.02, "npc")), alpha = .25,
             curvature = .4) + coord_cartesian() + 
  labs(x = "Longitude", y = "Latitude") + ggtitle("Flights to and from Atlanta")



Timezone Differences

  1. (5 points) Calculate the change in time zones for each flight. When doing this, New York City to Los Angeles should be +3, and Los Angeles to New York City should be -3. Recreate your graph in (c). This time, color the lines by the change in time zones of each flight. Use a color gradient to do this. Be sure to include a detailed legend. Do this only for flights that either depart from or arrive at the Atlanta (ATL) airport.
ggmap(map) + 
  geom_curve(aes(x = source_lon, y = source_lat, xend = dest_lon,
                 yend = dest_lat, color = source_timezone - dest_timezone), 
             data = atl_airports, arrow = arrow(length = unit(0.02, "npc")),
             alpha = .25, curvature = .4) + coord_cartesian() + 
  labs(x = "Longitude", y = "Latitude") + ggtitle("Flights to and from Atlanta") +
  scale_colour_gradient(name = "Timezone\nDifference", limits = c(-3, 3),
                        low = "#FFC60D", high = "#810000")

For more interesting examples and for an in-depth description of ggmap, see the short paper by David Kahle and Hadley Wickham here.

Another good resource is here.



Problem 3: Choropleth Maps

(6 points each; 42 points total)

Choropleth maps are maps in which geographic regions (e.g. countries, states, counties, tracts, etc) are colored by a measured/statistical quantity. We’ll create some choropleth maps here.

part a.

Code is provided for the following tasks:

  • Read the 2009 Unemployment data from this link.
  • Adjust the column names of the dataset
  • Manipulate the state and county columns
unemp <- read.csv("http://datasets.flowingdata.com/unemployment09.csv", 
                  header = F, stringsAsFactors = F)
names(unemp) <- c("id", "state_fips", "county_fips", "name", "year", 
                  "?", "?", "?", "rate")
unemp$county <- tolower(gsub(" County, [A-Z]{2}", "", unemp$name))
unemp$state <- gsub("^.*([A-Z]{2}).*$", "\\1", unemp$name)

part b.

Code is provided for the following tasks:

  • Create two data.frames with the results of map_data() for US counties and for US states
  • Adjust the column names of the county data.frame
  • Use R’s state.abb and state.name objects to add proper abbreviations to a new variable, called state, in the county data.frame
library(ggmap)
county_df <- map_data("county")
names(county_df) <- c("long", "lat", "group", "order", "state_name", "county")
county_df$state <- state.abb[match(county_df$state_name, tolower(state.name))]
county_df$state_name <- NULL
state_df <- map_data("state")

part c.

Code is provided for the following tasks:

  • Merge or left-join the county_df and usemp data.frames using the state and county variables.
  • Add a new variable to the merged data.frame, called rate_discrete, that partitions the existing rate variable into 9 groups.
  • Sort the data.frame by the order variable, so that you can plot the map correctly.
choropleth_df <- merge(county_df, unemp, by = c("state", "county"))
choropleth_df <- choropleth_df[order(choropleth_df$order), ]
choropleth_df$rate_discrete <- cut_interval(choropleth_df$rate, 9)

part d.

Use ggplot() to create a choropleth map of US counties. The code is started for you below.

  • Fill each county with the discretized unemployment rate variable.
  • Use an appropriate color gradient.
  • Draw borders corresponding to the state and county borders.
  • Include an appropriate title and a legend
ggplot(choropleth_df, aes(long, lat, group = group)) +
  geom_polygon(aes(fill = rate_discrete)) +
  geom_polygon(data = state_df, col = "white", fill = NA) +
  ggtitle("US Unemployment Cholropleth Map") + xlab("Longitude") + ylab("Latitude") +
  scale_fill_brewer(palette = "YlGnBu")

part e.

Interpret your graph. In what areas of the United States is unemployment highest? In what areas is it lowest? Are there any noticeable geographic trends or patterns? Does anything else stick out to you in the graph?

There is very low unemployment in the center of the United States. Unemployment rates tend to increaase as we move towards the West, East, and South. It seems to be highest in California, Oregon, and Michigan as well as in states north of (and including) Mississippi, Alabama, Georgia, Florida, and South Carolina.

part f.

Look at the help file for the coord_map() function. Name at least 5 different map projections that are available to be used, and include the description of each one. (Note: You’ll have to also look at the help documentation for the mapproject() function.)

mercator(): equally spaced straight meridians, conformal, straight compass courses

sinusoidal(): equally spaced parallels, equal-area, same as bonne(0)

cylequalarea(lat0): equally spaced straight meridians, equal-area, true scale on lat0

cylindrical(): central projection on tangent cylinder

rectangular(lat0): equally spaced parallels, equally spaced straight meridians, true scale on lat0

  1. Recreate your graph from (d) twice, each time with a different map projection. Comment on any noticeable differences.
ggplot(choropleth_df, aes(long, lat, group = group)) +
  geom_polygon(aes(fill = rate_discrete)) +
  geom_polygon(data = state_df, col = "white", fill = NA) +
  ggtitle("US Unemployment Cholropleth Map") + xlab("Longitude") + ylab("Latitude") +
  scale_fill_brewer(palette = "YlGnBu") +
  coord_map("mercator")

ggplot(choropleth_df, aes(long, lat, group = group)) +
  geom_polygon(aes(fill = rate_discrete)) +
  geom_polygon(data = state_df, col = "white", fill = NA) +
  ggtitle("US Unemployment Cholropleth Map") + xlab("Longitude") + ylab("Latitude") +
  scale_fill_brewer(palette = "YlGnBu") +
  coord_map("sinusoidal")

The mercator projection tends to stretch geographics regions farther away from the equator, making these regions appear to be larger than they actually are. The sinusoidal projection is an “equal area” projection, meaning this distortion of the area of geographic regions is not an issue here. Instead, the lines of latitude are slanted to ensure that all geographic regions appear in the graph to have area proportional to their actual area. The differences are most pronounced in the northern regions of the US, where the distortion from the mercator projection is highest.



Problem 4: World War II Data Visualization Video

(10 points)

Watch this video. What do you like about it from a data visualization perspective (1-3 sentences)? What do you dislike, if anything (1-3 sentences)?

Things I liked about the video were its overall comparison of human lives lost using the people figures, and the separation in the bar graphs for important battles so you could see what proportion of the deaths were from this event. I also enjoyed the interactive feature were you could hover over the bars in the stacked bar chart and see were all the death were coming for. I also enjoyed the real time “we are here” at the end. Overall, a very well done video.

The only thing I didn’t like at the beginning was that most of the explaining was done with audio and not visuals, so harder to pick up on, but that changed as they got into more of the data.